Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
We study the differentially private (DP) empirical risk minimization (ERM) problem under the semi-sensitive DP setting where only some features are sensitive. This generalizes the Label DP setting where only the label is sensitive. We give improved upper and lower bounds on the excess risk for DP-ERM. In particular, we show that the error only scales polylogarithmically in terms of the sensitive domain size, improving upon previous results that scale polynomially in the sensitive domain size (more » « lessFree, publicly-accessible full text available August 29, 2026
-
We study the role of interaction in the Common Randomness Generation (CRG) and Secret Key Generation (SKG) problems. In the CRG problem, two players, Alice and Bob, respectively get samples X1, X2, . . . and Y1, Y2, . . . with the pairs (X1, Y1), (X2, Y2), . . . being drawn independently from some known probability distribution μ. They wish to communicate so as to agree on L bits of randomness. The SKG problem is the restriction of the CRG problem to the case where the key is required to be close to random even to an eavesdropper who can listen to their communication (but does not have access to the inputs of Alice and Bob). In this work, we study the relationship between the amount of communication and the number of rounds of interaction in both the CRG and the SKG problems. Specifically, we construct a family of distributions μ = μr,n,L, parametrized by integers r, n and L, such that for every r there exists a constant b = b(r) for which CRG (respectively SKG) is feasible when (Xi, Yi) ~ μr,n,L with r + 1 rounds of communication, each consisting of O(log n) bits, but when restricted to r/2 − 2 rounds of interaction, the total communication must exceed Ω(n/ logb(n)) bits. Prior to our work no separations were known for r ≥ 2.more » « less
-
We introduce a new technique for reducing the dimension of the ambient space of low-degree polynomials in the Gaussian space while preserving their relative correlation structure. As an application, we obtain an explicit upper bound on the dimension of an epsilon-optimal noise-stable Gaussian partition. In fact, we address the more general problem of upper bounding the number of samples needed to epsilon-approximate any joint distribution that can be non-interactively simulated from a correlated Gaussian source. Our results significantly improve (from Ackermann-like to "merely" exponential) the upper bounds recently proved on the above problems by De, Mossel & Neeman [CCC 2017, SODA 2018 resp.] and imply decidability of the larger alphabet case of the gap non-interactive simulation problem posed by Ghazi, Kamath & Sudan [FOCS 2016]. Our technique of dimension reduction for low-degree polynomials is simple and can be seen as a generalization of the Johnson-Lindenstrauss lemma and could be of independent interest.more » « less
-
We study common randomness where two parties have access to i.i.d. samples from a known random source, and wish to generate a shared random key using limited (or no) communication with the largest possible probability of agreement. This problem is at the core of secret key generation in cryptography, with connections to communication under uncertainty and locality sensitive hashing. We take the approach of treating correlated sources as a critical resource, and ask whether common randomness can be generated resource-efficiently. We consider two notable sources in this setup arising from correlated bits and correlated Gaussians. We design the first explicit schemes that use only a polynomial number of samples (in the key length) so that the players can generate shared keys that agree with constant probability using optimal communication. The best previously known schemes were both non-constructive and used an exponential number of samples. In the amortized setting, we characterize the largest achievable ratio of key length to communication in terms of the external and internal information costs, two well-studied quantities in theoretical computer science. In the relaxed setting where the two parties merely wish to improve the correlation between the generated keys of length k, we show that there are no interactive protocols using o(k) bits of communication having agreement probability even as small as 2–o(k). For the related communication problem where the players wish to compute a joint function f of their inputs using i.i.d samples from a known source, we give a simultaneous message passing protocol using 2O(c) bits where c is the interactive randomized public-coin communication complexity of f. This matches the lower bound shown previously while the best previously known upper bound was doubly exponential in c. Our schemes reveal a new connection between common randomness and unbiased error-correcting codes, e.g., dual-BCH codes and their analogues in Euclidean space. Read More: https://epubs.siam.org/doi/10.1137/1.9781611975031.120more » « less
-
Several well-studied models of access to data samples, including statistical queries, local differential privacy and low-communication algorithms rely on queries that provide information about a function of a single sample. (For example, a statistical query (SQ) gives an estimate of $$\E_{x\sim D}[q(x)]$$ for any choice of the query function $$q:X\rightarrow \R$$, where $$D$$ is an unknown data distribution.) Yet some data analysis algorithms rely on properties of functions that depend on multiple samples. Such algorithms would be naturally implemented using $$k$$-wise queries each of which is specified by a function $$q:X^k\rightarrow \R$$. Hence it is natural to ask whether algorithms using $$k$$-wise queries can solve learning problems more efficiently and by how much. Blum, Kalai, Wasserman~\cite{blum2003noise} showed that for any weak PAC learning problem over a fixed distribution, the complexity of learning with $$k$$-wise SQs is smaller than the (unary) SQ complexity by a factor of at most $2^k$. We show that for more general problems over distributions the picture is substantially richer. For every $$k$$, the complexity of distribution-independent PAC learning with $$k$$-wise queries can be exponentially larger than learning with $(k+1)$-wise queries. We then give two approaches for simulating a $$k$$-wise query using unary queries. The first approach exploits the structure of the problem that needs to be solved. It generalizes and strengthens (exponentially) the results of Blum \etal \cite{blum2003noise}. It allows us to derive strong lower bounds for learning DNF formulas and stochastic constraint satisfaction problems that hold against algorithms using $$k$$-wise queries. The second approach exploits the $$k$$-party communication complexity of the $$k$$-wise query function.more » « less
An official website of the United States government

Full Text Available